29. Remove, Repeat

Remove, Repeat

Question:

This word seems like an outlier in a certain sense, so let’s remove it and refit. Go back to text_learning/vectorize_text.py , and remove this word from the emails using the same method you used to remove “sara”, “chris”, etc. Rerun vectorize_text.py , and once that finishes, rerun find_signature.py . Any other outliers pop up? What word is it? Seem like a signature-type word? (Define an outlier as a feature with importance >0.2, as before).

Start Quiz:

INSTRUCTOR NOTE:

Special Note: Depending on when you downloaded the code provided for find_signature.py , you may need to change the code in lines 9-10 to be

words_file = "../text_learning/your_word_data.pkl"
authors_file = "../text_learning/your_email_authors.pkl"

so that the files created from running vectorize_text.py are reflected properly.